-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otel: fix flakiness and various issues in TestFBOtelRestartE2E #6819
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
adc2425
to
d8d9cb2
Compare
Moving this to draft since it requires work done in beats via elastic/beats#42412 . I need to bump the beats dependency in go.mod #6837. |
dc95b0a
to
d04606e
Compare
I'm repurposing this PR to include a series of fixes for the otel tests. Having the fixes as a batch as oposed to separate PRs speeds up the continuous integration builds. |
This test starts the collector with a timeout, but the error returned is not always a context cancelled, sometimes it returns err == nil, which is also fine, just not handled properly. While at it, fix some other issues I found while testing: - Using require inside a goroutine calls runtime.GoExit on failure, meaning the test exits immediatelly without doing any cleanup. Use assert in those cases.
d04606e
to
95c25c9
Compare
Pull request was converted to draft
ce3fb39
to
ac211c6
Compare
/test |
fleet airgapped tests are the one failing and are known issues, retriggering the tests. |
Thanks. I'm yet to see a test run that succeeded in the last week or so. It seems that every single retry the same exact set of fleet tests fail. |
💔 Build Failed
Failed CI StepsHistory
cc @mauri870 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nitpicks on the assertion functions, overall it looks good
_, err = inputFile.Write([]byte("\n")) | ||
require.NoErrorf(t, err, "failed to write newline to temp file") | ||
assert.NoErrorf(t, err, "failed to write newline to temp file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.NoErrorf(t, err, "failed to write newline to temp file") | |
assert.NoError(t, err, "failed to write newline to temp file") |
@@ -1519,7 +1513,7 @@ service: | |||
go func() { | |||
err = fixture.RunOtelWithClient(fCtx) | |||
cancel() | |||
require.True(t, errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled), "unexpected error: %v", err) | |||
assert.True(t, err == nil || errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled), "unexpected error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since we are not testing a boolean we can use https://pkg.go.dev/github.com/stretchr/testify/assert#Conditionf with a small function where we can also do some additional logging if necessary
_, found = uniqueIngestedLogs[msg] | ||
require.False(t, found, "found duplicated log message %q", msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason why we can't use assert.NotContains here?
_, found = uniqueIngestedLogs[msg] | |
require.False(t, found, "found duplicated log message %q", msg) | |
require.NotContainsf(uniqueIngestedLogs, msg, "found duplicated log message %q", msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, I just didn't knew about this function. TIL.
Pull request was converted to draft
Leaving this as draft since we had to remove all the tests as a part of a last ditch fix for EDOT in v9.0. I will revisit this once #7023 is reverted. |
What does this PR do?
This test starts the collector with a timeout, but the error returned is not
always a context cancelled, sometimes it returns err == nil, which is also
fine, just not handled properly.
While at it, fix some other issues I found while testing:
the test exits immediatelly without doing any cleanup, causing resource leaks. Use assert in those
cases.
Checklist
./changelog/fragments
using the changelog toolRelated issues